Four proofs of Gittins' multiarmed bandit theorem

نویسندگان

  • Esther Frostig
  • Gideon Weiss
چکیده

We study four proofs that the Gittins index priority rule is optimal for alternative bandit processes. These include Gittins’ original exchange argument, Weber’s prevailing charge argument, Whittle’s Lagrangian dual approach, and Bertsimas and Niño-Mora’s proof based on the achievable region approach and generalized conservation laws. We extend the achievable region proof to infinite countable state spaces, by using infinite dimensional linear programming theory. keywords: dynamic programming; bandit problems; Gittins index; linear programming ams classification: 90B36; 62L15; 90C40; 49L20; 90C05; 90C27; 90C57

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Explicit Gittins Indices for a Class of Superdiffusive Processes

We explicitly calculate the dynamic allocation indices (i.e. the Gittins indices) for multiarmed Bandit processes driven by superdiffusive noise sources. This class of model generalizes former results derived by Karatzas for diffusive processes. In particular, the Gittins indices do, in this soluble class of superdiffusive models, explicitly depend on the noise state.

متن کامل

Partially Observed Markov Decision Process Multiarmed Bandits - Structural Results

This paper considers multiarmed bandit problems involving partially observed Markov decision processes (POMDPs). We show how the Gittins index for the optimal scheduling policy can be computed by a value iteration algorithm on each process, thereby considerably simplifying the computational cost. A suboptimal value iteration algorithm based on Lovejoy’s approximation is presented. We then show ...

متن کامل

Computing an index policy for multiarmed bandits with deadlines

This paper introduces the multiarmed bandit problem with deadlines, which concerns the dynamic selection of a live project to engage out of a portfolio of Markovian bandit projects expiring after given deadlines, to maximize the expected total discounted or undiscounted reward earned. Although the problem is computationally intractable, a natural heuristic policy is obtained by attaching to eac...

متن کامل

A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements

We generalise classical multi-armed and restless bandits to allow for the distribution of a (fixed amount of a) divisible resource among the constituent bandits at each decision point. Bandit activation consumes amounts of the available resource which may vary by bandit and state. Any collection of bandits may be activated at any decision epoch provided they do not consume more resource than is...

متن کامل

Computing a Classic Index for Finite-Horizon Bandits

T paper considers the efficient exact computation of the counterpart of the Gittins index for a finitehorizon discrete-state bandit, which measures for each initial state the average productivity, given by the maximum ratio of expected total discounted reward earned to expected total discounted time expended that can be achieved through a number of successive plays stopping by the given horizon...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Annals OR

دوره 241  شماره 

صفحات  -

تاریخ انتشار 2016